Titanic Data Science Solutions
Workflow stages
The workflow goes through seven stages.
- Question or problem definition.
- Acquire training and testing data.
- Wrangle, prepare and clean the data.
- Analyze, identify patterns, and explore the data.
- Model, predict and solve the problem.
- Visualize, report, and present the problem solving steps and final solution.
- Supply or submit the results.
The workflow indicates general sequence of how each stage may follow the other. However there are exception cases.
- Combine multiple workflow stages.
- Analyze by visualizing data.
- Perform a stage earlier than indicated.
- Analyze data before and after wrangling.
- Perform a stage multiple times in our workflow. Visualize stage may be used multiple times.
- Drop a stage altogether. We may not need supply stage to productize or service enable our dataset.
Question and problem definition
A knowing training set of samples, which listing passengers who survived or not in the Titanic disaster. Make a model that can determine based on a given test dataset (not containing the survival information), if these passengers in the test dataset survived or not.
We can develop early understanding about the problem. Here are the highlights to note.
- On April 15, 1912, the Titanic sand and killed 1502 out of 2224 passengers and crew. Translated 32% survival rate.
- There were not enough lifeboats for the passengers and crew, is one of the reasons why lose so many life.
- Some groups of people were more likely to survive than others, such as women, children, and the upper-class.
Workflow goals
The data science solutions workflow solves for seven major goals.
Classifying. Classify or categorize samples, or can understand the implications or correlation of different classes with the solution goal.
Correlating. Approach the problem based on available features within the training dataset. For example:
- Which features within the dataset contribute significantly to the solution goal?
- Statistically speaking is there a correlation among a feature and solution goal?
- As the feature values change and the solution state also change, and visa-versa?
Correlating can be tested both for numerical and categorical features in the given dataset.